Faster Coordinate Descent via Adaptive Importance Sampling

نویسندگان

  • Dmytro Perekrestenko
  • Volkan Cevher
  • Martin Jaggi
چکیده

Coordinate descent methods employ random partial updates of decision variables in order to solve huge-scale convex optimization problems. In this work, we introduce new adaptive rules for the random selection of their updates. By adaptive, we mean that our selection rules are based on the dual residual or the primal-dual gap estimates and can change at each iteration. We theoretically characterize the performance of our selection rules and demonstrate improvements over the stateof-the-art, and extend our theory and algorithms to general convex objectives. Numerical evidence with hinge-loss support vector machines and Lasso confirm that the practice follows the theory.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Faster Optimization through Adaptive Importance Sampling

The current state of the art stochastic optimization algorithms (SGD, SVRG, SCD, SDCA, etc.) are based on sampling one active datapoint uniformly at random in each iteration. Changing these probabilities to better reflect the importance of each datapoint is a natural and powerful idea. In this thesis we analyze Stochastic Coordinate Descent methods with fixed non-uniform and adaptive sampling. ...

متن کامل

Safe Adaptive Importance Sampling

Importance sampling has become an indispensable strategy to speed up optimization algorithms for large-scale applications. Improved adaptive variants—using importance values defined by the complete gradient information which changes during optimization—enjoy favorable theoretical properties, but are typically computationally infeasible. In this paper we propose an efficient approximation of gra...

متن کامل

Stochastic Dual Coordinate Ascent with Adaptive Probabilities

This paper introduces AdaSDCA: an adaptive variant of stochastic dual coordinate ascent (SDCA) for solving the regularized empirical risk minimization problems. Our modification consists in allowing the method adaptively change the probability distribution over the dual variables throughout the iterative process. AdaSDCA achieves provably better complexity bound than SDCA with the best fixed pr...

متن کامل

Accelerating Stochastic Gradient Descent via Online Learning to Sample

Stochastic Gradient Descent (SGD) is one of the most widely used techniques for online optimization in machine learning. In this work, we accelerate SGD by adaptively learning how to sample the most useful training examples at each time step. First, we show that SGD can be used to learn the best possible sampling distribution of an importance sampling estimator. Second, we show that the samplin...

متن کامل

Penalized Bregman Divergence Estimation via Coordinate Descent

Variable selection via penalized estimation is appealing for dimension reduction. For penalized linear regression, Efron, et al. (2004) introduced the LARS algorithm. Recently, the coordinate descent (CD) algorithm was developed by Friedman, et al. (2007) for penalized linear regression and penalized logistic regression and was shown to gain computational superiority. This paper explores...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017